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Abstract 

Pbit, besides its simplicity, is definitely the fastest list sorting algo- 
rithm, ft considerably surpasses all already known methods. Among 
many advantages, it is stable, linear and be made to run in place. 1 
will compare Pbit with algorithm described by Donald E. Knuth [Hj in 
the third volume of "The Art of Computer Programming" and other list 
sorting algorithms. 



1 Introduction to lists 

A Lists is a set of independent data formats, very often called nodes. Particular 
nodes are connected with each other by means of pointers. Nodes are usually 
created dynamically, which is good for economical use of memory. We will deal 
with singly-linked lists because Pbit was created for such type of lists (which 
does not prevent from sorting double- linked lists). In a singly- linked list each 
node contains pointer only for another node in the list. The last pointer shows 
NULL. To remember at which point the list starts there is a special pointer 
called root or a head of the list pp. The pointer of root may also be the first 
node of the list. Queue or Stack are examples of a singly- linked list. 

2 Structure of data representing a node and de- 
scription of a simply function inserting into 
the singly-linked list. 



struct Node{ 
Node *next; 
TYPE data; 

>; 



A node is the structure which contains 
at last one pointer for 'node' and field of 
any 'data' type, on the ground of which 
the nodes will be sorted. 



We know what a particular of the list looks like, so let us analyse the func- 
tion inserting elements into it. Function 'push' corresponds with the function 
stacking elements. 
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void push (Node **root, TYPE value){ 

Node *new_Node = new(std: :nothrow) Node; 

if(new_Node == NULL){ 

//Error - memory overflow. 

return; 

} 

new_Node->data = value; 
new_Node->next = *root; 
*root = new_Node; 
} 



1. Function 'push' takes two parameters. One is the pointer of the first 
element of the list (root) and the other argument is the value added to 
the list. 

2. We declare pointer of 'node' type, next we attribute to it memory declared 
by the 'new' operator. 

3. If the pointer shows NULL it means there is no space in memory (memory 
overflow) . 

4. Function should send the message about an error and . . . 

5. Finish working. 

6. Haring dealt with storage allocation, we write in the new node the value 
which was passed by function argument. 

7. Node 'next' is ascribed address of the root (head) that is attributed to 
first element of the list. 



The root indicate at the new node. 



3 The idea of sorting using Pbit 

The whole idea of Pbit is very interesting. Besides its simplicity it is indis- 
putable the fastest algorithm which sorts lists. Among many advantages, it is 
stable, linear and does not require memory ("in situ" sorting algorithm). 



Split the list in relation to the bit pattern. Each pattern variation 
is represented by separate list. Apply recurrently the same sorting 
function to all the lists but remember to project the pattern onto the 
succeeding bits. When the last key is looked through, merge sorted 
fragments of the list. 

Pbit is combination of BucketSort f^, RadixSort ^ and MergeSort ideas. 
Bit pattern is a set of K bits. K defines the number of bits which are cut off the 
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binary word. M defines number of bits in a binary word. For example, one-byte 
sign type 'char' contains 8 bits. Four-byte number is 32 bits long. Q is variation 
K of element binary set {0,1}, O = 2^. It determines the number of groups 
(muTiber of roots), which the list may be divided into during one step, n is the 
number of sorted elements. Meaningless for linear Pbit sorting. Coefficient "by 
n" depends only on K (length of bit pattern) and M (number of bits in binary 
word). Coefficient "by n" can be determined by dividing M by K. If using my 
merging method, add n to the coefficient. 
/(n) = f *n + n 

In Pbit I use four-bit pattern (K = 4) for short lists, which means that I split 

the list into sixteen groups in maximum(f2 = 2^ = 2^ = 16). 

Constants in (source code) Pbit sorting algorithm: 

K = 4 (four-bit pattern is excellent for short lists sorting) 

f2 - dependent on K; equals (2^ = 16) 

Variables (depending on type of data): 

M - depends on the size of sorted element. 

n - depends on the number of elements. 

Example: We have a list consisting of eight one-byte elements: 

21 — > 3 — > 209 — > 14 — > 156 — > 47 — > 3 — > 214 — > NULL 

We put element of the list in relation to the first four bits {K = 4) from the 

right (the most important - little-endian order of byte): 

{3,14,3} - {21} — {47} — {156} {209,214} 

Next, we put elements in each groups separately looking at the following four 
bits (in case of one- byte type - the last ones): 

{ {3} - {3} - {14} } - {21} - {47} - {156} - { {209} - {214} } 
In "iwo steps" n one-byte elements are sorted. 

In how many steps four-byte numbers will be sorted? 

In Number of bits of sorted element divided by the length of bit pattern. 
The length K of the pattern is constant which equals 4. In case of four-byte 
elements variable K equals 32 ( because 4 bytes = 4*8 bits = 32) bits. So 
coefficient "by n" equals ^-|-1 = 8-|-1 = 9. In eight steps four-byte numbers 
are sorted. In one step the lists are merged. 

0(n) algorithms such as countsort or radixsort required memory proportionally 
to n. How is it possible that Pbit does not require memory proportionally to n? 
Pbit does not count how many times number 'x' occurred in sequence 'L'. It 
does not compare numbers, (with one another) either. This algorithm is based 
on a specific grouping of elements and needs only a few bytes parameters 
or roots. It is a very little amount of memory independent of n — number of 
elements to be sorted. Precisely, it depends on and coefficient. 
The maximum amount of occupied storage can be expressed by formula: 
T=(rj*4-|-3*4)*^-|-2*4 bytes, assume size of pointer equal four bytes 
For constant K which equals four, Q. equals sixteen. 
T = (64 -h 12) * + 8 bytes 
T = 19 * M -h 8 bytes 
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The amount of occupied storage T(M) will depend only on the size of sorted 
element. 

For sign types (1 bytes = 8 bits) Pbit will require maximum T(8) = 160 bj^es. 
For "integers" (4 bytes = 32 bits) Pbit will require maximum T(32) = 616 bytes. 
It means that it needs constant, modest space usage. 
Space usage = * O) 

n^2^, K - constant 

M defines number of bits in a binary word (size of TYPE). There is M constant. 
That's why memory complexity equal 0(1). 

The greatest advantages of Pbit is it be made to run in place. 
It requires merely 0(1) memory! 

Why maximum instead of constant amount of memory? 

Because "empty" lists are omitted and are not taken into account during suc- 
cessive recurrent calls. 

What order algorithm is responsible for merging list? 

It's 0(n), because Pbit merges only those fragments of the list which were sorted. 
We group list elements till the last key. When the last key is looked through, 
all we have to do is to merge sorted fragments of the list. Every few, maximum 
recurrent calls sorted fragment of the list is (recursion) merged. 

Is linearity the best advantage of Pbit? 

I don't think so. 0(n) time algorithms rather will be much faster than O(nlgn) 

only while sorting very long lists. 

The greatest advantage of Pbit does not require memory proportionally to n 
(it's not extensive). Pbit is stability and the fact that it can be adapted for 
sorting "special data formats" . In case of Pbit you can't talk about expected 
or pessimistic time complexity. Time complexity is always the some, no matter 
how many elements there are and what arrangement of sequence is (sorted, all 
elements are the some). In the four position I would put the statement about 
linear time complexity, small coefficient. Last, but not last advantage of Pbit is 
simplicity in implementation and clear idea. 



4 Cutting bits off in binary words 

All quantities in computer are represented by binary words. 

Pbit is based on cutting bits off and sorting elements in relation to those bits. 
Unfortunately, not every high-standard language has built-in operators which 
carry out operations on bits. My means of arithmetic operators, however, it's 
possible to build a function which can cut bits off in a binary word. Pbit will 
use function of cutting bits off in a binary word, which in Pascal language can 
be defined as follows: 
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function bits(number, M: integer) : integer; 
begin 

bits := (number div ( 2^ ) ) mod O 

end 

M - bit number, from which the pattern in binary word is determined 

Dividing by power of two is like shift bits to the right with index exponent. 
For example, we divide 123/8 = 15,375 
123 decimally is 01111011 binary. 

We move bits three positions to the right (three because the third power of 
two equals eight): 00001111 binary is 15 decimally. In Turbo Pascal there is a 
special function for this purpose. 
Language C has built-in operator "»" . 

number div 2'''^ 

in C can be written: 

number » M 

The result is divided by modulo The operation of modulo division is carry 
in order to read only K last bits (f2 = 2^ = 16). The first versions of algorithm 

for cutting off bits in sorted elements used projection with bit field structure. 
To implement algorithm in other program languages, not only C/C++, I have 
chosen arithmetic/logic method. 

bits := (number div (2^)) mod 

In C operations div and modulo can be changed into faster logical operations: 
bits = (number >> M ) & (17 — 1) , assume = 2^ (fi — 1) - received number 
should be written hexadecimals. 

I use bit shift, instead of division by power of two and logical operator 'AND', 
instead of modulo division because the statement which counts index of root 
table should be optimized to a maximum. It is this statement that the speed of 
function depends on. 
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5 Implementation in C-\ — |- 



1 


Node *Pbit(Node *L, unsigned M, Mode *P=MULL){ 


2 


if(L){ 


'i 


M=M-K ; 


4 


Node *tab[f2]={NULL}; 


5 


for(Node *i,**in; i=L; i->next=*in, *in=i)-C 


6 


in=&tab[(L->data»M) '/, f2] ; 


7 


L=L->next ; 




> 


8 


if(M) forCint i=0; i<f7; i++) P=Pbit (tab [i] , M, P); 


9 


else for (int i=0; i<^l; i++) P=merge(tab[i] , P) ; 




} 


10 


return P; 




} 



Function takes parameters: 
L - the first node of the hst to be sorted. 
P - "end maker", assumptive equals NULL. 
M - size of sorted element expressed in bits. 



Function returns pointer to the first element of the sorted in decreasing order. 

1. Node *Pbit(Node *L, unsigned M, Node *P=NULL) ; 

Function has three parameters, one of which assumptive equals NULL. 
Two parameters and function one of Node. Node is the structure which 
consists of at least pointer of 'Node' type and variable 'data'. List will be 
sorted in relation to variable (key) 'data'. There is a broader description 
of lists and structure in the first and second point 

2. After calling the function we check whether the list which Pbit is to sort 
is not empty. Such check-up is not necessary but it makes the algorithm 
faster, because empty lists arc omitted in the process of grouping. If 
the list happens to be empty we return pointer P (end marker), thus we 
return part of a sorted list. Additional, we could check if we do not sort 
one-element list because in case of one-clcmcnt list it's not necessary to 
look through all bits of sorted element. However, it does not bring any 
significant benefits. 

3. Operation of subtraction of pattern length from length of sorted element is 
connected with subsequent grouping of nodes. At the first stage of sorting 
for four-byte numbers M = 32 and constant K = 4. M = M-K,M = 28 
- which means that we will read determine list number into with 28 (to 
32) bit. Four bits which were read determine list number into which we 
insert sorted element. 
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4. We create root table of new lists. Sorted nodes will be placed in those lists. 
Each elements of table is declared with NULL, table [size] ={0, ... } - 
such initiation of table elements is correct. Null value will be attributed 
to each element. Compiler may signal lack of initiation values. 

5. Pointers declared in loop 'for' work similarly to function inserting new 
elements in the list. The description of algorithm responsible for this 
process is in the second point. 

6. Table index is four-bit pattern received after perusal oi K = 4 bits starting 
with M bit of binary word 'data'. Index value is (L->data>>M) 7, O, it's 
better to replace modulo operation with logical 'AND' operation 
(L->data>>M) & because probably it's five times as fast as re- 
mainder of division operator. Such efficiency jump is very important in 
a loop which decides about the speed of algorithm. If we leave modulo 
operator, it's possible that compiler will change modulo operator into log- 
ical 'AND' during the process of optimization . Additionally, it's possible 
to make indexing more optimum: 

Instead: in=&tab[ (L->data>>M)&Oxf ], we can write 
in=tab+ ( (L->data»M) feOxf ) 

Because 'in' is of "pointer to pointer" type, 
fetab [index] = &(*(tab+index)) = tab + index 

7. After inserting node into a new list, pointer 'L' will show another element 
in the list. 

8. In these two lines we decide if we will carry on sorting or if it's high time to 

merge (sorted) lists. If 'M' is greater than zero, it moans that we haven't 
looked through all bits yet. This is why we sort all newly created lists in 
relation to those non-locked through bits - each separately. 

9. If 'M' equals zero it means that we have looked through all bits of sorted 
numbers. Then, we start the process of list merging (function merge and 
how it works will be described on the next section). 

10. If the sorted list was empty we return end-maker 'P' (NULL) stated in 
parameter. If not we return "pointer P" - indicating the sorted list. 

5.1 Function merge and problem of list merging. 

Function merge was adjusted to the root type. Roots in singly-linked lists can 
be divided into two groups. Roots of the first group consist of two pointers. 
One pointer indicates the first element of the list while the other one indicates 
the last element. The second list contains roots which arc the first elements of 
lists. They are nodes. I have chosen the root which represents the second group 
approach. Function merge(A,B) merges list 'B' with the end of list 'A'. Returns 
address as the first node of a new list. To do that it goes through the whole list 
'A' and changes bridging of the last link 'NULL' into 'B'. Anyone who knows 
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data structures very well (which singly-linked list arc) may claim that it would 
be better to use lists of the first root type. Always, independently of n We 
would make one operations. Using the second type node, however, enables us 
to make n operations. Looking at sorting as a whole the second way will be 
faster. Why? In Pbit we split the list more often that we merge it. By using 
the second type roots we put successive elements to the lists faster. 



Definition of function responsible for list merging: 



1 


Node *merge (Node *A, Node *B){ 


2 


if(A==0) return B; 


3 


Node *temp=A; 


4 


while (teinp->next) temp=temp->next ; 


5 


temp->next=B; 


6 


return A; 




} 



1. Assumption: We insert node 'A' at the end list 'B'. 

2. If list 'A' is empty, we transmit list 'B'. 

3. Temporary variable 'temp' remembers address of the first node of list 'A'. 

4. We move to the last element of list 'A'. 

5. We change bridging of the last element of the list 'A' : NULL into 'B'. 

6. We return the first element of list 'A'. 
Wouldn't it he better to check first it list 'B' is empty? 

Why making loops and going through to the last element of list 'A ' since in case 
'B' = NULL we change nothing? No, it wouldn't, because in most cases list 'B' 
will not be empty. Additional statement "if" would make time of list merging 
worse. 

You might point out that execution time is Q{n* since there are ^ steps 
and each of n items is examined in each step. At worst, the merging takes an- 
other n steps to get to the end of each list. 
Of course no! For example include counter on merge function: 
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global unsigned count ; 

Node *nierge(Node *A, Node *B){ 
if(A==0) return B; //we do nothing 
count++; //first node (maybe only) 
Node *temp=A; 
while (t einp->next ) -[ 
temp=temp->next ; 
count++; //n-node of list 
} 

temp->next=B; 
return A; 
} 



We group list elements till the last 
key. When the last key is looked 
through, all we have to do is to 
merge sorted fragments of the list. 
Every few, maximum recurrent 
calls sorted fragment of the list 
is (recursion) merged. Pbit by 
turns merges and splits the lists. 
It's (n * ^ [split]) + n[merge] 
not (n * ^ [split]) * n[merge] 
Always count equal n 
Only that it is linear and non- 
extensive. 



6 Differences between sorting in descending and 
ascending order 

In case of sorting in decreasing order (10^9^8—*...^ NULL) lists 
arc merged from the root which represents the smallest elements till the root 
which represents the greatest ones and the greatest element (node address) is 
returned. 

In case of sorting in ascending order (8^9^10^...^ NULL) lists are 
merged from the greatest to the smallest one. The smallest element is returned. 



1 


Node 


*Pbit(Node *L, unsigned M, Node *P=NULL){ 


2 


ifa){ 


3 




M-=K; 


4 




Node *tab[n]={NULL}; 


5 




for (Node *i,**in; i=L; i->next=*in,*in=i)-[ 


6 




in=tab + ((L->data»M) & (f2-l)); 


7 




L=L->next ; 






} 


8 




if(M) for(int i=0; i<fl; i++) P=Pbit(tab[i] , M, P) ; 


9 




else for (int i=0; i<0; i++) P=merge(tab[i] , P) ; 






} 


10 


return P; 




} 





(8) and (9) line in sorting code undergo change. 



In case of sorting in decreasing order: 
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8. if(M) forCint i=0; i<0; i++) P=Pbit (tab [i] , M, P) ; 

9. else for (int i=0; i<0; i++) P=nierge(tab[i] , P) ; 

We begin to split and merge list from tab[0] (list containing the smallest ele- 
ments) and end with the last list (containing the greatest elements). 
Pointer 'P' indicates the node with the highest value. 

In case of sorting in ascending order: 

8. if(M) forCint i=(n-l); i>=0; i--) P=Pbit (tab [i] , M, P) ; 

9. else for (int i=(f2-l); i>=0; i — ) P=inerge(tab[i] , P) ; 

We begin to split and merge lists from tab[f2 — 1] (list containing the great- 
est elements), and end with the last list (containing the smallest elements). 
Pointer 'P' indicates the node with the lowest value. 

7 Negative numbers sorting 

So far we here discussed integer non-negative numbers sorting. To sort the list 
containing numbers "with sign" we use not one but two call a function Pbit. 
This is why sorting function should require information if the sorted numbers 
are "with sign" or unsigned. It should also take information how to sort - in 
ascending or decreasing order. 

Node *Pbit(Node *L, int sign_numbers=l , int sort_decreasing=0) ; 

We assume that marker sign_numbers = 1 and sort_decreasing = 0. 

sign_numbers = 1 
It means that we sort numbers "with sign" . 

The problem with 'sign_numbers' definition we may leave Pbit function or use 
"Run-Time Type Identification" from typeinf o.h library. 

template<class type> inline int sign_numbers (type) { 
return (type) ((type) l«((sizeof (type) «3)-l))<0; 

J 

Only for integer the function is able to do! 

sort_decreasing = 

It means that we sort in ascending order. 
Each value than '0' indicates sorting in ascending order. 
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Pbit written in pseudo-code 



Numbers "with sign" and 
non-negative (unsigned) are 
recorded in a different way. 
For representing numbers 
"with sign" so called "binary 
complement" is use. To 
understand fragment of the 
code responsible for sorting 
numbers "with sign" we must 
only know that the most 
important bits of negative numbers are always greater than the most important 
bits of positive numbers. This is why while sorting negative numbers using 
Pbit, which sorts in decreasing order, in fact receive ascending list. There is a 
complete Pbit code written in C-|— I- on another section. 



if (sign_immbers){ 

Sorts numbers 'with sign' (positive and negative) 
if (sort_decreasing) Sorts in decreasing order 
else Sorts in ascending order 
} 

else{ 

Sorts numbers unsigned (non-negative) 

if (sort.decreasing) Sorts in decreasing order 

else Sorts in ascending order 

} 



8 Complete Pbit implementation in C-\ — |- 



1 


Node *Pbit(Node *L,int sign_numbers,int sort_decreasing) ; 


2 


if(L==0) return 0; 


3 


unsigned size=sizeof (L->data) «3 ; 




//sign_numbers = (l)-sort numbers 'with sign' (negative) 




// (0) -unsigned (non-negative) 


4 


if (sign_numbers){ 


5 


Node *L_unsigned = 0, *L_negative = 0; 




//split list into negative eind 




//non-negative (unsigned) numbers 


6 


for (register Node *temp; temp = L; ){ 


7 


L = L->next ; 


8 


if (temp->data < 0){ 


9 


temp->next = L_negative; L_negative = temp; 


10 


}else-[ temp->next = L_iinsigned; L_unsigned = temp; } 
} 


11 


return (sort_decreasing) ? 


12 


Pbit_decreasing(L_unsigned, size. 




Pbit_decreasing(L_negative, size) ): 


13 


Pbit_ascending(L_negative , size, 




Pbit_ascending(L_unsigned, size) ); 

> 


14 


else return (sort_decreasing)? Pbit_decreasing(L, size) : 


15 


Pbit_ascending(L, size); 

> 



Pbit_decreasing - we sort list in decreasing order 

(returns pointer to the greatest element). 
Pbit_ascending - we sort list in ascending order 



11 



(returns pointer to the smallest element). 



2. We check if sorted list 'L' is not empty. 

3. We determine the size of sorted element in bits. 

4. If (logical) variable sign_numbers if different than zero, it means that 
sorted elements are signed. 

5. We create two pointers which will become the roots of new lists; list rep- 
resenting negative numbers and list representing non-negative (unsigned) 
numbers. 

6-10. We split lists, list roots represent nodes (5). 

11. If variable sort_decreasing is. . . 

12. different than zero we sort in decreasing order (10^9^8^...), 

13. if not we sort in ascending order (8^9^ 10^...). 

14. Variable sign_numbers is equal to zero. It means that we sort numbers 
"unsigned" (non-negative) . 

14-15. If variable sort_decreasing in different than zero we sort in decreasing 
order, if not we sort in ascending order. 

9 Comparison of list sorting algorithms 

QuickerSort (SI is characterised by very small O(nlgn) coefficient. The num- 
ber of other algorithm operations is not significantly greater than number of 
comparison. QuickerSort is based on list division in relation to the first node 
(root). We divide the list into elements which are equal, greater or less than the 
chosen node. Next, the "biggest" (by mean of value) and the "smallest" lists 
are recurrently sorted. Its most serious disadvantages are instability and cost 
0{n?) in worst case behavior. In storage complexity we must take into account 
memory for stack and heap implementation. As far as worst case behaviour is 
concerned, the depth recursion is n — 1. 
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Version 1 


Version 2 


Node *merge(Node *t. Node *P){ 


Node *qs(Node *L, Node **P=0){ 


if (t==0) return P; 


if(L==0) return 0; 




if (L->next == 0) return (*P = L) ; 


Node *tmp = t; 




while (tmp->next) tmp=tmp->next ; 


Node *L1=0, *L2=L, *L3=0; 


tmp->next=P ; 


Node *P1=0, *P2=L, *P3=0; 


return t; 


L=L->next ; 


} 


L2->next=0 ; 




for (Node *i;i=L;){ 




L=L->next ; 


Node *qs(Node *L, Node *P=0)-[ 


if(L2->data < i->data) 


if(L==0) return P; 


{ i->next=Ll; Ll=i; }■ 




else if (L2->data == i->data) 


Node *L1=0, *L2=0, *L3=0; 


{ i->next=L2; L2=i; } 


L2=L; L=L->n; L2->next=0; 


else 




i i->next=L3; L3=i; } 


for (Node *i;i=L;){ 


} 


L=L->next ; 




if(L2->data > i->data) 


L3=qs(L3, &P3) ; 


{ i->next=Ll; Ll=i; } 


if(L3) P3->next=L2; 


else if(L2->data == i->data) 


else L3=L2; 


{ i->next=L2; L2=i; } 


if(Ll){ 


else 


P2->next=qs(Ll, &P1) ; 


i i->next=L3; L3=i; } 


if(P) *P=P1; 


> 


y 




else if(P) *P=P2; 


return 


return L3; 


merge (qs (L3) .merge (L2 ,qs (LI) ) ) ; 
} 


> 
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n 




2e'^ 


36" 


46" 


56" 


66" 


re" 


86" 




1. 


3305 


7811 


12348 


18567 


23253 


31045 


36693 


43333 


50893 


2. 


2043 


5107 


8152 


11376 


14811 


18516 


22593 


26257 


30025 



1. QuickerSort (with outside the merge function) Versionl. 

2. QuickerSort (without the merge function) Version2^. 

MergeSort 0] Hke QuickerSort is based on element comparison. The best 
description of MergeSort is based on recursion. Algorithm divides list into two 
lists of the some size. Next, both lists are sorted (MergeSort) separately. To 
finish the process of sorting original n element list, two sorted halves are merged. 
Unfortunately, recurrent implementation of algorithm does not sort "in situ" , so 
we need twice as big memory as occupied by unsorted data. On my hardware 
versionl of algorithm 6 stopped list sorting which consisted of over million 
elements - it overflow the stack. It considerably restricts usefulness of "advanced 
recurrent" implementation of algorithm. This is why I used to comparision 
partly iterative (version2), more efficient MergeSort implementation. 

^To make a comparison I used more efficient version2. 
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Version 1 


Version 2 


Node ♦MergeSort(Node *) ; 


Node *MergeSort (Node *) ; 


Node *merge(Node *, Node *) ; 


Node *merge(Node *, Node *) ; 


Node *split(Node *) ; 


Node *split(Node *) ; 


Node *MergeSort (Node *list){ 


Node *MergeSort (Node *list){ 


if(list==0) return 0; 


if(list==0) return 0; 


if (list->iiext==0) return list; 


if (list->next==0) return list; 


Node *SecondList=split (list) ; 


Node *SecondList=split (list) ; 


return merge (MergeSort (list) , 


return merge (MergeSort (list) , 


MergeSort (SecondList) ) ; 


MergeSort (SecondList) ) ; 


} 


} 


Node *merge(Node *listl, 


Node *merge(Node *listl, 


Node *list2){ 


Node *list2){ 


if(listl==0) return list2; 


if (listl == 0) return list2; 


if(list2==0) return listl; 


if(list2 == 0) return listl; 


if (listl->data <= list2->data){ 




listl->next= 


Node *list = 0, *last = 0; 


merge (listl->next, list2) ; 


while (listl && list2){ 


retiirn listl; 


Node* p; 


} 


if (listl->data <= list2->data){ 


else{ 


p = listl; listl = listl->next; 


list2->next= 


}else{ 


merge (listl, list2->next) ; 


p = list2; list2 = list2->next; 


retiirn list2; 


} 


> 


if (list) last = last->next = p; 




else last = list = p; 


} 


> 




if (listl) last->next = listl; 




else last->next = list2; 


Node *split(Node *list){ 


return list; 


if(list==0) return 0; 


> 


if (list->next==0) return 0; 




Node *pSecondCell=list->next ; 


Node *split(Node *list) 


list->next=pSecondCell->next ; 


{ 


pSecondCell->next= 


Node* list2 = 0; 


split (pSecondCell->next) ; 


while (list && list->next){ 


return pSecondCell; 


Node* w = list->next->next ; 


y 


list->next->next = list2; 




list2 = list->next; 




list->next = w; 




list = list->next; 
> 




return list2; 

} 
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Unfortunately, recursion MergeSort implementation overflows the stack by sorting 
over-million list. For such list it obtained time 6569 (version2: 4016) milliseconds. 

I proposed Psort instead of QuickerSort to lists sorting. Algorithm divides 
the list into three sub-lists in relation to two nodes, into elements less than the 
first node, greater than the first node but less than the second node and greater 
than the second node. It's excellent for sorting short lists. 

Node *Psort (register Node *L, Node *P=0){ 
if(L==0) return P; 

if (L->next==0) return (L->next=P,L) ; 

Node *P2, *L2=0; 
{ 

Node *i, 

*tmp=(L->data > L->next->data? P2=L,L=L->next : P2=L->next)->next ; 

f or(P2->next=L->next=0; i=tmp; ){ 
tmp=tmp->next ; 

if(i->data < L->data){ i->next=L->next ; L->next=i; y 
else if(i->data > P2->data)-[ i->next=L2; L2=i; } 
else{ i->next=P2->next ; P2->next=i ; }■ 
} 



> 



L->next=Psort(L->next,P) ; 
P2->next=Psort(P2->next,L) ; 
return Psort (L2, P2) ; 
> 



Psort2 is modification of Psort. I implemented it for sorting long lists with 
often repeated value. Wegner [7] describes similar ternary partitioning algorithm 
more efficient than QuickSort. 
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Node *Psort2(Node ♦L, Node *P=0){ 
if(L==0) return P; 

if (L->next==0) return (L->next=P,L) ; 

Node *P2, *L2=0; 

{// 2 

Node *PP; 

{// 1 

Node *LL; 

{// 

Node *i, 

*tmp= (L->data > L->next->data? P2=L , L=L->next : P2=L->next)->next; 

for(PP=P2, LL=L, P2->next=L->next=0 ; i=tmp; ){//_for 
tmp=tmp->next ; 

if(i->data < L->data){ i->next=L->next ; L->next=i; }■ 
else if(i->data > P2->data){ i->next=L2; L2=i; } 
else{//_elsel 
if(i->data == L->data){i->next=LL; LL=i;} 
else if(i->data == P2->data)-[i->next=PP; PP=i;} 
else-[i->next=P2->next ; P2->next=i ; } 
>//elsel_ 
>//for_ 
}// 

L->next=Psort2(L->next,P) ; 

L=LL; 

}// 1 

P2->next=Psort2(P2->next,L) ; 

P2=PP; 

}// 2 

return Psort2(L2,P2) ; 
2 

Pbit sorts by grouping nodes. It belongs to the group of algorithms which I 
worked out and which uses "end marker" 'P'. It is linear sorting algorithm with 
coefficient dependent on the length of pattern 'K\ Conventionally pattern was 
set up for four bits. It makes sorting both short and long lists effective. It also 
has influence on very little need for memory, depending on size of sorted elements 
and not their number (it's non-extensive). Each programmer may change the 
length of the pattern and adept Pbit for his use. Such problems as "worst case 
behaviour" do not occur in case of Pbit. Data base programmers don't have to 
create a few lists, attribute them to particular data in structure and sort every 
list separately owing to stability of the algorithm. Above fact show why Pbit 
is considered not only as the fastest but also the most suitable for lists sorting. 
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Tests were carried out on computer equipped with processor AMD Athlon XP 1800+ 
(1533 Mhz). For needs of tests I filled the list with pseudo-random numbers obtained from 
function rand(), RAND_MAX=0x7FFF, from standard library C. Each time I initiated current time 
into pseudo-random numbers generator. Every test was carried out ten times, arithmetic mean 
of obtained results was written in the table. 
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40000 
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20000 
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Unfortunately, recursion MergeSort implementation overflows the stack by sorting 
over-million list. For such list it obtained time 6569 milliseconds. I used iterative MergeSort 
implementation in the test. 



For ten-million (long) list Pbit sorting with sixteen-bit pattern is more than 
seventeen times faster than the most popular MergeSort! 

9.1 Point on ctxis n (number of elements) in relation to 
linecir-logcirithmic order 

Probably each linear algorithm has some point on axis n (number of elements) 
in relation to which it is slower than linear-logarithmic algorithm. Let's check 
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from which point on axis 0(n) Pbit is slower than Quicker Sort O(nlgn). 

For bit pattern which equals four we sort n four-byte natural numbers: 
Fox K = 4 bits, M = 32 bits 

©(8n) will be faster than Qinlog^n) for n > 256 elements. 

However we cannot compare only coefficients since the most important opera- 
tions of algorithms differ in time. In respect of time QuickerSort differs from 
Pbit on statement responsible list division: 



QuickerSort 


Pbit 


if(data>x) 

else if(data==x) 
els,- 


tab[(data>>M) & OxF] 

OxF = — 1) example constant 


At the average n elements will be 
compared 1.5nlog2n times 


In case of Pbit n elements 
(relations) will be shifted 
(M/K) * n times 


Now we will analyse time consumption of operations. Because all ticks 
of the clock are the same, we omit them and give only mean time in 
nanosecond. 


Operation "if" in C requires 20 
nanosecond on the average 


Operations of bit shift, logical 
'AND' and reference to table ele- 
ment take in total 45 nanoseconds 
on the average in C 


n elements sorted in 
20 * 1.5 * nlog2n = 30nlog2n 
na.nosecnnds 


n elements sorted in 45 * 8n = 360n 
nanoseconds 



30nlog2n > 360n, n > 
nlog2n > 12n 
log2n > 12 
n > 4096 



For n > 4096 elements Pbit which splits thirty two-bit numbers with 
four-bit pattern will be faster. 

It does not mean, however, that we will see the differences while sorting less 
than 4096 elements. Modern computers will need only one hundredth of second 
to sort (linear-logarithmic) even ten thousand elements. It means that we 
should always use Pbit for lists sorting. For long lists we can enlarge 
the length of the pattern. It will make the coefficient smaller and consequently 
the algorithm will be faster. For example, for K = 8, coefficient equals four, 
which in result will cause efficiency jump and lessen the number of nodes, which 
linear Pbit will always be faster than linear-logarithmic algorithms to sixty four 
elements! 
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10 The length of bit pattern 



For longer pattern algorithm will require more memory. 

Maximum amount of occupied storage can be expressed by a formula: 
r=(f7*4 + 3*4)*^ + 2*4 [byte] 

UK = 4, algorithm will need 616 bytes to sort four- byte numbers. 
Whereas for K = 8, algorithm will require as much as 4152 bytes. 



Constants: K = 8, Q = 256, hex(f] - 1) = hex(256 - 1) = hex(255) = OxFF 

Node *Pbit(Node *L, unsigned M, Node *P=NULL){ 
if (L){ 

n-=8; 

Node *tab[256]={NULL}; 

for(Node *i,**in; i=L; i->n=*in,*in=i){ 
in=tab + ( (L->data»M) & OxFF ) ; 
L=L->next ; 
} 

if(M) forCint i=0; i<256 ; i++) P=Pbit (tab [i] , M, P) ; 
else forCint i=0; ±<256; i++) P=merge (tab [i] , P) ; 



return P; 



} 
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11 Differences between Pbit and other algorithms 
sorting with "bit fcey" 

/ will compare Pbit with algorithm described by Donald E. Knuth 
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Address counting sort 

In case of Pbit there is no worst case. If all elements are identical, it takes far 
less time to sort the list than for non-uniform distribution of the keys. 
Position interchange 

Position sorting by swamping can be used to sort mainly tables. Pbit does not 
look through the both sides of the list. 
Scatter sort 

Recurrent scatter sort, unlike Pbit, has very high proportionality factor. 
List position sorting 

Pbit scatters in relation to the most significant bits, not to the least significant 
ones. We scatter list elements till the last bit and lists scattered that way are 
merged. Only sorted of the list are merged. 
Position Sort in relation to the most significant number 
Algorithm spends too much time on small "chunks" . Algorithm sorting in re- 
lation to the least significant numbers is relatively efficient. Probably the best 
compromise was suggested by M.D. MacLaren in 1966 He recommended 
using sorting with the least significant number as first but use it only for most 
significant numbers. After that sequence will not be completely sorted but al- 
most ordered, so not much inversion will be left there. This is why it is possible 
to finish sorting by using simple insertion. Also W. Dobosiewicz ^U] dealt with 
algorithm sorting with the most significant number as first until he obtained 
short subsequences. Unfortunately, in case of non-uniform key arrangement, al- 
gorithm has worst case O(nlgn). The work of Dobosiewicz inspired some other 
scholars to invent new algorithms based on address counting the best known of 
which is perhaps the scheme worked out by M. Tamminen in 1985 "Let's 
assume that all keys are fractions from range [0..1). First, we scatter all N 
records to baskets, putting key K into basket KN/8 (. . . )" . 

While looking through described algorithms we can see consider- 
able differences concerning mainly merging of sequence element. Pbit 
merges the lists only when the lists have been sorted. " We group list 
elements till the last key. When the last key is looked through, all we 
have to do is to merge sorted fragments of the list. Every few, max- 
imum M/K, recurrent calls sorted fragment of the list is (recursion) 
merged." Recursion and the part of code responsible for merging 
sorted fragments of the list make Pbit very efficient and innovated 
algorithm. 



if(M) forCint i=0; Kil; i++) P=Pbit (tab [i] , M, P) ; 
else for (int i=0; i<il; i++) P=merge (tab [i] , P) ; 



All algorithms sorting with "bit key" are very alike. Very similar 
are also all algorithms which sort by comparing elements. How- 
ever, comparison implementation makes QuickSort faster than In- 
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sertionSort. Just like sorting with "bit key" implementation makes 
Pbit the most efficient algorithm which sorts lists. 



12 Tables versus list 

What are the most important differences between a table and list? Firstly, 
tables have fixed size whereas list size depends on the size of data stored there, 
enlarged by the size of storage area necessary to keeps pointers for its every 
element. Secondly, to change the order of list elements, it is enough to change 
whole a few pointers, which is far less expensive than moving whole portion of 
memory in case of change of table elements sequence. Thirdly, it's possible to 
add new elements to the list and remove elements without changing the position 
of others. There differences help draw the conclusion that if changes will take 
place into data set very often, especially when the number of set members is 
not predictable, then it is good to create a list in order to store data set. 
Nobody has compared algorithms sorting tables and lists. Why? Because table 
elements can be freely looked through. Nodes in singly-linked lists have pointer 
only for the next node, which makes looking through the lists difficult. Moreover, 
it's faster to refer to elements of the table than nodes of the lists. So far 
the fastest list sorting algorithm was QuickerSort, much slower than linear- 
logarithmic tables sorting algorithms. It is good to choose tables when we want 
to have quick access to data and when data were to be sorted. For the first 
time in the history of sorting there this can compete with the fastest algorithms 
tables sorting! 
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For this test I used algorithm quicksort from standard library C 



13 Double-linked lists sorting 

We use double-linked lists very often. How to adopt Pbit for sorting such lists? 
In case of double-linked lists sorting we do not change Pbit's code. We only add 
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repair function "reverse connection" of list elements (nodes). After (double- 
linked) list sorting using Pbit, pointers showing the previous nodes are incorrect. 
Function which changes "reverse connection" should receive root of the list as 
parameter. Next, in turns going through its elements it should set up pointer 
'back' in such a way as to show the element right behind. 

14 Floating-point numbers sorting 

Floating-point numbers in digital computer written in the following form^ 

/ = m * 2"^, I - number, m - mantissa, c - characteristic 

In principle, characteristic is more important than mantissa for the process of 
sorting. This is why we can sort the number in relation to mantissa first, and 
then once more in relation to its characteristic. Such sorting is correct because 
Pbit is stable. You should remember about conversion, because logical shift 
operator does not work with floating-point numbers! 

Floating-point formats (standard ANSI IEEE 754): 

— simple, single precision (SINGLE): m = 23 + 1, c — 8 

— extended, single precision (SINGLE, EXTENDED): m 32, c 11 

— simple, double precision (DOUBLE): m = 52-fl,c=ll 

— extend, double precision (DOUBLE, EXTENDED): m >= 64, c >= 15 



15 Portability 

Byte order'^ It's obvious that all modern machines have 8-bit bytes. But in 
different machines there are different representations of object greater than one 
byte. For example, integers type 'short' (language C), which typical have two 
bytes, may be stored in memory in two ways: their less significant byte will be 
at smaller address (less significant byte first, so little-endian order of bytes) or 
inversely - at greater address (more significant byte first, so big-endian order 
of bytes) than more significant byte. Although machines in both cases treat 
memory as sequence of words in the some order, they interpret the byte order 
within the words differently. This is why it's important to change the lines 
of code which is responsible for looking through bits by Pbit (the version of 
algorithm described in documentation was implemented by machine of little- 
endian type). 

^The problem of binary arrangements of float-point numbers was well described by Daniel 
W. Lewis n 

''The problem of code portability was well described by Kernighan and Pike I13| . 
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#include<stdio . h> 
int maiii(){ 

unsigned long x=0xll223344UL; 
unsigned int size=sizeof (x) ; 
unsigned char *p= (unsigned char *)&x; 
whileCsize — ) printf("%x, ",*p++); 
return 0; 
} 



To check how our machine 
implements the byte order 
within the word we can 
use a simple program (stan- 
dard PC is machine of little- 
endian type) 



The result of this program in machine with decreasing order of bytes 

(Motorola 68K): 11, 22, 33, 44, 

In case of machine with ascending order of bytes 
(80x86 Intel Architecture): 44, 33, 22, 11, 

Machine type PDP-11 (old-fashioned 16-bit machine) will give the result: 

22, 11, 44, 33, 



Arithmetic or logical shift Shift of value with number sign to the right 
by means of operator '>>' can be treated as arithmetic shift (the copy of bit 
sign is copied during bit shift) or logical (zero will be placed in released bits 
during the shift). Fortunately, this problem does not concern Pbit, despite the 
fact that I use shift operator. It happens that way because operation of logical 
'AND' comes after the operation of bit shift, which unifies the manner of bit shift. 
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